20180411 試爬辣妹圖

#導入模組 BS/url/正則表達式
from bs4 import BeautifulSoup 
from urllib.request import urlopen 
import re 

#將連結抓取出來並命名此variable為html
#將variable包進BS以lxml來解析
html=urlopen("http://www.beautyleg.com/photo/show.php?no=96").read().decode('utf-8')
soup=BeautifulSoup(html, features='lxml')

#適用BS的function來找到需要的訊息，<a>裡面href這個屬性，所有符合.jpg的形式(正則表達)
#將所有抓到的檔案做迴圈打印出來
pic = soup.find_all('a',{'href':re.compile('.*?\.jpg')})
for l in pic:
    print (l['href'])

同場加映

#發現url中，各連結只有數字的差異
#給定一個迴圈，以range的方式回帶給urlopen，接著做一樣的事情，就把所有的連結的圖抓出來啦！
m_html='http://www.beautyleg.com/photo/show.php?no='
for i in range(0,101):
    all_html=urlopen(m_html+str(i)).read().decode('utf-8')
    soup=BeautifulSoup(all_html, features='lxml')
    all_pic = soup.find_all('a',{'href':re.compile('.*?\.jpg')})
    for a in all_pic:
        print (a['href'])

20180411 使用BeatifulSoup+正則表達抓取連結

20180411 試爬辣妹圖

results matching ""

No results matching ""